In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.
Visualize the German Traffic Signs Dataset. This is open ended, some suggestions include: plotting traffic signs images, plotting the count of each sign, etc. Be creative!
The pickled data is a dictionary with 4 key/value pairs:
# Load pickled data
import pickle
import numpy as np
import urllib.request
# TODO: fill this in based on where you saved the training and testing data
training_file = urllib.request.urlopen("https://www.dropbox.com/s/vfdp152tvhdhnib/train.p?dl=1#")
testing_file = urllib.request.urlopen("https://www.dropbox.com/s/vfdp152tvhdhnib/test.p?dl=1#")
train = pickle.load(training_file)
test = pickle.load(testing_file)
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
### To start off let's do a basic data summary.
# TODO: number of training examples
n_train = len(X_train)
# TODO: number of testing examples
n_test = len(X_test)
# # TODO: what's the shape of an image?
image_shape = X_train[0].shape
# # TODO: how many classes are in the dataset
n_classes = len(np.unique(y_train))
print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
%matplotlib inline
import matplotlib.pyplot as plt
counts = np.unique(y_train, return_counts=True)
fig = plt.bar(counts[0], counts[1])
plt.show()
#EXAMPLES
import time
printed = set()
for i, image in enumerate(X_train):
if y_train[i] not in printed:
print(y_train[i])
plt.figure()
plt.imshow(image)
plt.show()
printed.add(y_train[i])
Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.
There are various aspects to consider when thinking about this problem:
Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
Describe the techniques used to preprocess the data.
Answer:
To preprocess the data, I first converted all of the images to grayscale, because I wanted to simplify the process and help mitigate the impact of different lighting conditions. Then I normalized the images to 0 mean and unit variance, so that the images would be able to be processed in the same way. I then took all of the labels and One Hot encoded them so they could be compatible with the network.
from sklearn.preprocessing import scale
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import Normalizer
from PIL import Image
features = []
for image in X_train:
features.append(np.array(Image.fromarray(image, 'RGB').convert('L').getdata()))
normalizer = Normalizer()
features = normalizer.fit_transform(np.array(features))
encoder = OneHotEncoder(sparse=False)
new_train = []
for y in y_train:
new_train.append([y])
labels = encoder.fit_transform(np.array(new_train))
Describe how you set up the training, validation and testing data for your model. If you generated additional data, why?
Answer: I used a simple train test split to split the original training data into validation and training sets. By default, the function puts 75% of the data into the train set and 25% in the validation set. I think this is reasonable, as 25% is big enough to get good results for the accuracy but also low enough as to not deplete the model of too much data.
X_train, x_validate, y_train, y_validate = train_test_split(features, labels)
What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.
Answer: I heavily relied on this tutorial: https://www.tensorflow.org/versions/r0.11/tutorials/mnist/pros/index.html I created a convolutional neural network: First Layer- reshapes the flattened grayscale representation of the image into a 32x32 image, and then does a 5x5 convolution with 32 features for each patch. Then, max pooling is applied. Second Layer- similar to the first layer, but with 64 features for each patch. Third Layer- Takes the output from the entire second layer and generates 1024 features. Then, dropout is applied. Readout Layer - Takes the 1024 features and generates 43 features, corresponding to the signs.
import tensorflow as tf
from tqdm import tqdm
sess = tf.InteractiveSession()
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
#placeholder data
x = tf.placeholder(tf.float32, shape=[None, 1024])
y_ = tf.placeholder(tf.float32, shape=[None, 43])
#first layer
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,32,32,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
#second layer
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
#final layer
W_fc1 = weight_variable([8 * 8 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 8*8*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
#dropout
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
#readout layer
W_fc2 = weight_variable([1024, 43])
b_fc2 = bias_variable([43])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)
Answer: I tested batches of 64, 128, and 256, and settled on 128. I tested many of the other TensorFlow optimizers, but I found that AdamOptimizer yielded the best accuracy. I found that performance stopped increasing at around 10,000 iterations with the learning rate of 0.0001, which was low enough that I didn't see noticeable overfitting. I also tested different combinations of learning rates and iterations, but I found this was the best.
from sklearn.utils import shuffle
def next_batch(size):
x, y = shuffle(X_train, y_train, n_samples=size)
return x, y
batch_size = 128
iterations = 10000
learning_rate = 0.0001
#0.974034
#training
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())
for i in tqdm(range(iterations)):
batch = next_batch(batch_size)
if i%100 == 0: #updates
train_accuracy = accuracy.eval(feed_dict={
x:batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
#testing (on validation set)
print("validation accuracy %g"%accuracy.eval(feed_dict={
x: x_validate, y_: y_validate, keep_prob: 1.0}))
Testing: With >97% accuracy on the validation set, I thought it was time to test on the test set. Performance was surprisingly even better than the validation set at 99.4%.
features = []
for image in X_test:
features.append(np.array(Image.fromarray(image, 'RGB').convert('L').getdata()))
features = normalizer.transform(np.array(features))
new_train = []
for y in y_test:
new_train.append([y])
labels = encoder.transform(np.array(new_train))
print("test accuracy %g"%accuracy.eval(feed_dict={
x: features, y_: labels, keep_prob: 1.0}))
Testing: When applied to real images, the model performed dismally (at least for these images. Out of the five that I tested, it only classified one (do not enter) correctly.
What approach did you take in coming up with a solution to this problem?
Answer: I approached the problem in a methodical way, testing different combinations of hyperparameters, optimizers, and pooling to arrive at the best implementation possible for the validation data.
Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.
You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.
As you can see below, out of the 5 test images I used, only one (do not enter) was classified correctly. For the others, performance was quite dismal. I am not sure why this is, and how it is possible that I scored 99% on the test set but only 20% on these test images.
Image 1- 20kph, predicted yield. I would think this would be easy to classify, because it's nothing but the sign. I'm not sure why it didn't work. The model was very certain about its prediction, but 20kph didn't appear in the top 5.
Image 2- 70kph, predicted general caution. This sign is photographed from a weird angle (not one you'd see from a car) so it makes sense that it wouldn't predict properly. The model was somewhat certain about its prediction, and 70kph appeared as the third prediction.
Image 3- no entry, predicted no entry. I thought this was pretty straightforward, and the classifier got it right. The model was very certain about this prediction.
Image 4- pedestrian, predicted 30kph. The sign is in full view and I have no idea why it got it so wrong. The model wasn't very certain about the prediction, but pedestrian doesn't appear in the top 5.
Image 5- pedestrian, predicted general caution. This is an American sign, so it would be hard for the classifier (trained on German data) to work for. The model was pretty certain about its prediction, but pedestrian didn't appear in the top 5.
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import csv
%matplotlib inline
reader = csv.DictReader(open("signnames.csv", "r"))
mapping = {}
for row in reader:
mapping[row["ClassId"]] = row["SignName"]
for image in ["20kph.jpg", "70kph.jpg", "do not enter.jpg", "german-pedestrian.jpg",
"pedestrian2.jpg"]:
image = mpimg.imread(image)
plt.figure()
plt.imshow(image)
plt.show()
image = np.array(Image.fromarray(image, 'RGB').convert('L').getdata())
image = normalizer.transform(image)
probabilities=y_conv
softmax_probabilities = tf.nn.softmax(probabilities.eval(feed_dict={x: image, keep_prob: 1.0},
session=sess))
print("softmax probabilities:")
plt.figure()
plt.plot(softmax_probabilities.eval(session = sess)[0])
plt.show()
top_five = sess.run(tf.nn.top_k(softmax_probabilities, k=5))
for i, signtype in enumerate(top_five[1][0]):
print("Prediction number " + str(i+1) + ": " + mapping[str(signtype)])